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Abstract 



We study the frequency distribution of family names. From a common 
data base, we count the number of people who share the same family name. 
This is the size of the family. We find that (i) the total number of different 
family names in a society scales as a power-law of the population, (ii) the 
total number of family names of the same size decreases as the size increases 
with a power-law and (iii) the relation between size and rank of a family name 
also shows a power-law. These scaling properties are found to be consistent 
for five different regional communities in Japan. 

Scaling laws have been playing an important role in science for the past several decades 
]]]]. Diverse systems in nature have been found to exhibit a scaling law and self-similarity 
without a fine tuning of external parameters — known as self-organized-criticality. A simple 
model proposed by Bak et. al. JJ] shows that the minimal ingredients of these scaling be- 
haviours is to have a large number of degrees of freedom and nonlinear interactions between 
them. Human societies also show complexity which meets the above features of self-organized 
criticality. In this context, many of human activities including word freqeuncy 0, traffic 
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flow J4j, economics ||, population growth ||, city growth 0, internet ||, citation frequency 
H and war distribution [[KJ have been reported to show scaling behaviour. 

Here, we study frequency distribution of family names in Japanese societies. We define 
a family as a group of people who share the same family name, i.e., the different families are 
identified by their family names. We also define the size of a family, s, as a number of people 
in that family. We rank families by their size from the biggest family to the smallest family; 
For example, the biggest family in town of Fuso is "Senda" with family size s(Senda) = 296, 
so its rank is r(Senda) = 1. The second biggest is "Kondo" with size s(Kondo) = 229, and 



rank r (Kondo) = 2, and so on (TTJ. In this way we measure the size and the rank of all 
families. 

We analyze the telephone directories of five regional communities in Japan: town of 
Haruhi, town of Fuso, city of Inazawa, city of Kasugai and 1/3 of the city of Nagoya. The 
directories were published in 1998 by the communications company "NTT". The total 
number of customers S appeared in these directories are 1634, 7775, 23365, 65988 and 
177267, respectively. First, we count the number of different family names, N, appeared in 
the directories. In Fig. [I] we plot N versus S and we find that 

N ~ S x , (1) 

with an exponent \ = 0.65 ± 0.03. 

Next, we investigate the scaling properties of two different quantities: (i) the distribution 
of the family size n(s) which is the number of families of the same size s, and (ii) the relation 
between size and rank of a family, i.e., s(r) which forms the so-called Zipf's plot ||. The two 
quantities are complementary in a sense that n(s) mainly focuses on the scaling property of 
the smaller size family while s(r) highlights the scaling property of the bigger size family. We 
find the power-law scalings for both the quantities which are consistent for all five regions 
investigated. 

We measure the distribution n(s) for each town which is shown in Fig. [2]a in double 
logarithmic scale. It shows a nice power-law behaviour with same exponent for all five 



different communities. We suggest the following scaling form for n(s) fl2|| : 



n(s) = (2) 

where the scaling function f(x) behaves as / ~ x~ T for x < 1 and / = 1 for x 1. Here 
s* is a characteristic family size at which n becomes one, i.e. n(s*) = 1, which in turn gives 
A = 1. In Fig. [|b we try to collapse data using the scaling form of Eq. (^j with an additional 
scaling law s* ~ S a and the scaling exponent a = 0.37 ± 0.03. A linear fit of the collapsed 
scaling function yields r = 1.75 ± 0.05. 
From the normalization condition, 

"* n(s)ds = N, (3) 

and the scaling form for n(s) [Eq. (0)] we obtain a relation, s* ~ N 1 ^ . This scaling, 
combined with our finding iV ~ S x , gives 

a = \ (4) 

This scaling relation is well consistent with the exponents measured within error bars. 

In Fig. |3]a we plot the family size s versus rank r in double logarithmic scale. Each curve 
shows a crossover behaviour from one power-law regime with exponent <pi = 0.67 ± 0.03, to 
another steeper power-law decay with exponent 4>n = 1.33 ± 0.03 at the characteristic rank 
r* which also scales as r* ~ S a . We propose the following scaling form for s(r); 

<r)=r*g(^), (5) 

where the scaling function g behaves as g ~ x"^' for x <C 1 and g ~ x~^ n for x ^> 1. 
In Fig. |^b we try to collapse the data using the scaling form of Eq. ([5]) and the best fit is 
obtained when a' = 0.5 ± 0.05. 

Two quantities, n(s) and r(s), are related by an integral equation WM; 



r( s ) = / n(s')ds' ~ s 1 ^. (6) 



By inverting the relation we obtain a scaling relation, s(r) ~ r 1 -^. This relation gives the 
exponent 

1 



(7) 



T-r 

because the scaling exponent r is measured for small s, i.e. for r > r*. Note that the Eq. (|7|) 
is well satisfied by our results. The fact that the crossover points r* scales as S ' 5 suggests 
that the sampling of the population is random so that relative deviation of the probability 
decreases as S~ ' 5 as number of data points S increases. 

To test the role of the communities on the observed scaling behaviours we randomly 
select a population and repeat our analysis for the extracted data set. Figure f| shows the 
distributions for the randomly chosen population S = 2189,6566, 19696,59089 and 177267 
out of the biggest data for 1/3 of city of Nagaya. It shows very close scaling behaviours 
as Figs. |l] to |3|. This experiment suggests that the families are distributed randomly in 
the town without spatial correlation. Such scaling universality in the family structure of 
contemporary societies could be explained as a result that the time scale characterizing the 
migration of pupulation in a community is much shorter than the time scale asscoiated with 
the reproduction of a famiy name. 

The scaling exponents r = 1.75, <pi = 0.67 and <fiu = 1.33 are different from the Zipf's 
result on word frequency where the exponents are r = 2.0 and <fi = 1.0. The power-law 
relation between N and S and it's exponent x = 0-65 observed in family name distribution 
seem to be nontrivial. One may expect this scaling law breaks if the number of available 
family names in a society is too small compared to the population. Cohen et. al. [lJJ] found 



that this situation occurred in the words frequency distribution — for very large S, N(S) 
approaches a plateau. They found that the exponent % for the number of different words in a 
text is also a function of length of the text. This is true also for the societies where the family 
names are strictly inherited from fathers to sons without any creation of new family names. 
In fact, the expectation number of sons per parents is one under the stationary constant 
population. Then the survival probability P(t) of a family name after t generations decreases 



as P(t) ~ t -0 ' 5 . As a result, after many generations, only a few family names will dominate 
the whole population in the society. This is the situation in countries where the creation of 
new family names has been strictly restricted for many generations such as in Korea. The 
total number of family names in Korea is about 250 while the total population is about 
50 millions. On the contrary Japan has most rich family names in the world whose total 
number of family names is about 132, 000 and the population is about 125 millions. The 
creation of a new family name in Japan is also very rare. However, historically the most of 



Japanese family names were created about 120 years ago [|15l . The short history of family 
names may cause to preserve the diversity and the scaling properties of family names as it 
was at the creation. 

In summary, we have investigated the distribution of Japanese family names for five 
different regional communities in Japan. From the our empirical investigation, the power- 
law relation between total number of different family names and total population appeared in 
a telephone directory with the exponent \ = 0.65. Also we have found that the name- variety- 
size distribution shows nice power-law scaling with the exponent r = 1.75 and the cutoff 
exponent, a = 0.37. These scaling properties are consistent for five regional communities 
and randomly generated societies with with different populations. In a size-rank distribution 
of family name we have obtained a crossover behaviour from one exponent, <pi = 0.67 to 
another exponent (fin = 1.33 at the crossover point r* ~ S a ' with a' = 0.5. This result is 
consistent even if the specific family names of higher rank in one community is different from 
those in other communities. We have also derived scaling relations between these exponents. 
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FIG. 1. The number of family names N against the total population S for five different regional 
societies in Japan shows power-law behaviour as iV ~ S x . By linear regression in double logarithmic 
plot we estimate the exponent \ = 0-65 ± 0.03. 
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FIG. 2. a) The double logarithmic plot of the histogram n(s) vesus family size s for five regions 
in Japan, b) Data collapse using the scaling form in Eq. (|2|). The linear fit of the power-regime 
gives t = 1.75 ± 0.05. 
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FIG. 3. a) The size of a family name s(r) when plotted against the rank of the family name 
r shows a crossover behaviour at the characteristic rank r* where r* scales as S a ■ The solid 
line connects the crossover points whose slope is one implying s(r*) ~ r*. b) Data collapse using 
the scaling form in Eq. (||). A crossover behaviour is observed from the <pj = 0.67 ± 0.03 to 
<j>n = 1.33 ±0.03. 
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FIG. 4. a) The double logarithmic plot of N versus S obtained from the distributions for the 
randomly chosen population S = 2189, 6566, 19696, 59089 and 177267. It shows a simple power-law 
relation, N ~ S x with x = 0.58. b) The double logarithmic plot of the number of family names of 
same size s, n(s), versus the size s. c) Data collapse using the scaling form in Eq. The linear 
fit of the power-regime gives r = 1.75. 
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